Bayesian Agglomerative Clustering with Coalescents
نویسندگان
چکیده
We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman’s coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. We show experimentally the superiority of our algorithms over the state-of-the-art, and demonstrate our approach in document clustering and phylolinguistics.
منابع مشابه
Agglomerative Clustering of Bagged Data Using Joint Distributions
Current methods for hierarchical clustering of data either operate on features of the data or make limiting model assumptions. We present the hierarchy discovery algorithm (HDA), a model-based hierarchical clustering method based on explicit comparison of joint distributions via Bayesian network learning for predefined groups of data. HDA works on both continuous and discrete data and offers a ...
متن کاملBayesian Hierarchical Clustering with Exponential Family: Small-Variance Asymptotics and Reducibility
Bayesian hierarchical clustering (BHC) is an agglomerative clustering method, where a probabilistic model is defined and its marginal likelihoods are evaluated to decide which clusters to merge. While BHC provides a few advantages over traditional distance-based agglomerative clustering algorithms, successive evaluation of marginal likelihoods and careful hyperparameter tuning are cumbersome an...
متن کاملRandomized Algorithms for Fast Bayesian Hierarchical Clustering
We present two new algorithms for fast Bayesian Hierarchical Clustering on large data sets. Bayesian Hierarchical Clustering (BHC) [1] is a method for agglomerative hierarchical clustering based on evaluating marginal likelihoods of a probabilistic model. BHC has several advantages over traditional distancebased agglomerative clustering algorithms. It defines a probabilistic model of the data a...
متن کاملApplication of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering,...
متن کاملDiscovering Dynamics Using Bayesian Clustering
This paper introduces a Bayesian method for clustering dynamic processes and applies it to the characterization of the dynamics of a military scenario. The method models dynamics as Markov chains and then applies an agglomerative clustering procedure to discover the most probable set of clusters capturing the di erent dynamics. To increase e ciency, the method uses an entropy-based heuristic se...
متن کامل